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AMENDMENT NO. 1 JULY 1993 

TO 
IS 7200 ( Part 1 ) : 1989 PRESENTATION OF 
STATISTICAL DATA 

PART1 TABULATION AND SUMMARIZATION 

( Second Revision ) 

( Page 4, clause 7.3, para I ) — Insert the following at the end of 
this para: 

'Uses of histogram have been explained with practical examples in 
Annex B.' 

( Page 11, Annex B ) — Insert the following in the existing Annex B: 

'ANNEX B 

( Clause 7.3 ) 

USES OF HISTOGRAM 

B-l ADVANTAGES OF HISTOGRAM 

A histogram can be effectively used for process control. It has the 
following advantages: 

a) It gives the graphical representation of the distribution; 

b) The mean and standard deviation of the distribution can be 
calculated easily; and 

c) Histograms after stratification are useful to obtain clues foi 
process improvement. 

B-2 PRACTICAL APPLICATIONS 

B-2.1 Identification of Problems by the Shape of the Distribution 

B-2.1.1 As explained in 7.4, after a histogram is prepared, the midpoint 
of each class interval is connected by free hand and a smooth curve 
is drawn without following the unevenness of the histogram. If the 
distribution is normal, this curve usually exhibits a bell-like form 
( see Fig 4. ) 

In contrast, non-normal distribution curves such as in A-E of Fig. 5 
are sometimes obtained. Attention shall be paid to the following points 
in the identification of problems on the basis of these histograms. 

Gr I 
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When there is no symmetry, this 
type of distribution occurs. For 
example, in the distribution of 
particle size of granules produced 
by grinding, the number of large-size 
granules decreases rapidly while 
the small size granules exhibit a 
long-tailed distribution. This is 
because the number of large gra- 
nules is small in the raw material 
and fine ones occur in a natural way 
even before the grinding operation. 
When different types of data are 
mixed, this kind of distribution 
arises. The shape suggests that 
stratification may be possible 
8 / \ operator-wise or instrument-wise. 

This type of distribution often 
appears when measurements are 
taken by an inaccurate method or 
the class-width is not an integral 
multiple of the measuring unit or 
any other similar factor or cause 
depending upon the situation. 

This is a truncated distribution 
which arises when measurements of 
only those items which meet the 
specifications are recorded or when 
the inspection department makes 
allowances for substandard articles 
to meet the required quantity It 
also appears when the data below 
a certain level are removed inten- 
tionally or when saturation is 
reached in a chemical reaction 
because there are no data in the 
region beyond the saturation point. 

This type of distribution is attri- 
butable to differences in quality 
raw materials or process 
abnormalities. 
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Fig. 5 Non-Normal Disttibution 
Curves 



B-2.2 Comparison with Specification and Target Values 

B-2.2.1 When specification limits and target values are known, these 
shall be entered in the histogram and it shall be examined whether the 
actual data distribution is satisfactory. The mean should be as close 
to median as possible. 

B-2.2.2 Comparison of the histogram with specification limits is 
explained by taking A-G in Fig. 6 as examples. 

A. The distribution of data is within the specification limits with a 
considerable margin on both sides. One of the measures is to make 
the tolerance tighter to improve the quality of the product. 

The control limits of the process may be relaxed if reduced cost or 
increased productivity can be expected. 

B. The distribution of data is within the specification limits, but the 
mean ( ^ ) is shifted towards the lower specification limit. Therefore, 
slight decrease in the process average may result in off-specification 
items. When this type of distribution appears, " steps shall be taken 
to increase the process mean. 

C. The distribution of data is barely within the specification limits. 
Even a slight change of the mean ( p ) may produce off-specification 
items. When this type of distribution appears, measures shall be taken 
to reduce the variability of the process or the appropriateness of the 
specification limits shall be reviewed. 

D. The distribution of data is beyond the specification limits. When 
this type of distribution appears, action shall be taken immediately. 
Taking into consideration the quality required by the user and the 
capability of the processes, the appropriateness of the specification 
limits shall be reviewed at first. If the quality required by the user 
and the process capability can be compromised when the specification 
range is widened to 8 times the standard deviation, revision of the 
specifications is one of the solutions. If the shape of the histogram 
is considerably flat with no distinct peak, or like B and C of Fig. 5, 
the cause shall be identified and removed. 

Process improvement may be required if the tolerance can be reduced 
and the distribution is normal. 

E. The distribution of the data is barely within the specification 
limits, and the mean is shifted to the left. This type of distribution 
may be due to three causes: 

a) The product below the lower specification limit would not 
appear under certain manufacturing conditions. 

b) The products below the lower specification limit might have 
been eliminated by 100 percent inspection. 
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Fig. 6 Various Histograms 



c) Off-specification products might have been included intention- 
ally within the specification limits. This may possibly occur 
when measurement is left to the manufacturing department and 
the rate of off-specification products is controlled strictly. 
Action shall be taken to increase the mean to the centre of the speci- 
fication range and reduce the variability of the distribution. 

F. The range of the distribution is smaller than that of the specifica- 
tion limits, but data are distributed beyond the lower specification 
limit because the midpoint is deviated to the left. Action shall be 
taken to increase the mean. 

G. Most of the observations are distributed within the specification 
limits andthe target value is also at the midpoint of the specification 
range. But some observation* lie beyond the lower specification 
limit. We have the distribution like E of Pig. 5. As this indicates 
non-normality, the cause shall be detected and removed. 

8*2.3 Meatiflcatlei of Cantes by Stratification 

B-2.3.1 When the range of distribution is large and the rate of off- 
specification items is nigh, like D of Fig. 6, stratification may be used 
for the identification of causes. The cause that influences the most 
shall be identified and the histogram shall be prepared after stratifica- 
tion with respect to the identified cause. 

When stratification with respect to one cause is insufficient, it shall be 
repeated with others. 

Examples of histograms after stratification are shown in Fig. 7 and 8. 

There are two peaks in the histogram ( A ) in Fig. 7. Since this histo- 
gram is drawn on the basis of observations on the products manufac- 
tured by two machines, stratification by the first and second machine 
is carried out to obtain the two histograms ( B ) and ( C ) of Fig. 7. 

Stratification reveals that the range of the distribution of the product 
manufactured by the first machine is more than that by the second 
machine and its mean is lower. 
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Fig. 7 Histograms Ann Stratification by Machines 
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B-2.3.2 Example 

The finishing accuracy of a metal roll 200 mm in diameter \% specified 
to be under 10/1 000 mm, but off-specification articles amounted to 8 
percent. The first histogram in Fig. 8 shows the data of the products 
classified in the classes with the width of 1/1 000 mm. 

This histogram shows apparently that the mean is deviated to the left 
and the tail is extended gradually to the right. 

The finishing of the roll was carried out by two operators, A and R. 
Accordingly, two independent frequency distribution tables were 
prepared individually for A and B and histogram after stratification 
were prepared as shown in the second histogram of Fig. 8. in this 
graph, the solid line indicates the data of A and the dotted line that 
of B. The number of observations for A is 41 and that for B is 109. 
These overlapping histograms show that the products manufactured 
by A are within the specification limits, but those by B are widely 
distributed and B is responsible for all the off-specification articles. 

Action must be taken to improve the working accuracy of operator B, 
When the working practice of B was examined carefully, it was found 
that B did not attach a roll to the grinder properly. His accuracy was 
improved to a large extent when he attached a roll in the same 
way as operator A. 
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Fig. 8 Stratified Histograms of Finishing Accuracy of Outside 
Diameter of Rolls Between Two Operati ons 
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FOREWORD 

This Indian Standard f Part I ) was adopted by the Bureau of Indian Standards on 30 June 1989. 
after the draft finalized by the Quality Control and Industrial Statistics Sectional Committee had 
been approved by the Basic Standards, Systems and Services Division Council. 

Numerical data arise in modern industrial operations at a number of stages, for example, in the 
development of new methods or new products, in reviewing pa>t experience for setting up standards, 
in assessment of quality of incoming materials, in controlling quality during the manufacturing 
processes, in inspection and testing of end products, in surveys of performance in actual use, and 
in the study of consumer preferences. It is important to ensure that only such data as are really 
useful and necessary are collected and most effective use is made of them. 

Once the requisite data are collected, it becomes imperative that the same are properly condensed 
and presented to facilitate the drawing of correct inferences or conclusions. Depending on the 
need of the situation, the data may be tabulated and summarized or condensed graphically or 
pictorially. Accordingly, this part of the standard deals with the tabulation and summarization 
of the statistical data where as Part 2 deals with the diagrammatic repiesentation of data. 

This standard was originally issued in 1974 with a view to providing the procedures for recording 
and summarization of data, and computation of quantitative measures of central tendency and 
dispersion. The first revision was taken up in 1981 with a view to removing lacunae observed 
while making class intervals. The present ( second ) revision has been taken up with a view to delet- 
ing the clause on 'normal distribution* so that it may be included in IS 9300 ( Part 2 ) : 1979. 
"Statistical models for industrial applications : Part 2 Continuous models*. The Committee also felt 
that the various exafhrles on preparation of frequency table, cumulative frequency tables, and 
calculation of mean and standard deviation may be changed by taking the live data from the industry. 
The opportunity has also been taken to utilize the experience gained over these years and to 
streamline the standard with others published in the meantime. 

In reporting the result of a test of analysis, if the final value is to be rounded off, it shall be done 
in accordance with IS 2 : 1960 'Rules for rounding off numerical values ( revised )'. 
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PRESENTATION OF STATISTICAL DATA 



PART 1 TABULATION AND SUMMARIZATION 



( Second Revision ) 



1 SCOPE 

1.1 This standard (Part 1) outlines the proce- 
dures for recording the data, preliminary scrutiny 
for elimination of discrepancies and mistakes 
in data, summarization of data by means of 
histograms, frequency distributions and com- 
putation of quantitative measures of central 
tendency and dispersion. 

2 REFERENCES 

2.1 The following Indian Standard* are necessary 
adjuncts to this standard. 



5 SCRUTINY 
ERRORS 



AND MINIMIZATION OF 



5.1 Discrepancies and errors may arise in the 
course of recording of the basic data, summari- 
zation, calculation of statistical measures of 
location and dispersion and presentation of final 
results. As far as possible, built-in checks should 
be provided at each of these stages. The per- 
sons handling the data should be suitably trained. 
An extreme observation should be carefully 
examined before acceptance or rejection and 
helpful guidance in this regard may be obtained 
from IS 8900 : 1978. 



IS No. 



Tide 



IS 7200 Presentation of statistical data: 

(Part 2) : 1975 Part 2 Diagrammatic representa- 
tion of data 

IS 7920 : 1985 Statistical vocabulary {revised) 

IS 8900 : 1978 Criteria for rejection of outlying 
observations 



3 TERMINOLOGY 

3.1 For the purpose of this standard, the defini- 
tions of various statistical terms shall be as given 
in IS 7920 : 1985. 



4 RECORDING OF DATA 

4.1 A proforma should be devised to record the 
items of information required together with 
certain ancilliary information needed for identi- 
fication and verification. The units of measure- 
ment and the degree of accuracy of figures 
needed should be clearly specified before hand. 
Instructions for filling in the proforma, parti- 
cularly the marginal observations should be 
clearly stated as footnotes at the bottom or 
reverse of the proforma. The sequence in which 
the observations are made should be preserved, 
the data and time of observation, the observer's 
name or identity number, and the instruments 
used should be noted. As an illustration, the 
recording of data for moisture content of bis- 
cuits is presented in Annex A. 



6 FREQUENCY DISTRIBUTION 

6.1 It is desirable to present the original data 
in a suitable form as in Annex A for clearer 
understanding. However, in many situations 
this may not be possible and another method 
for summarization of data may become impera- 
tive. Even when the data can be presented in 
full, it is rarely that the results or conclusions 
can be obtained without condensation of data 
and farther analysis. 

6.2 One simple and very useful way of summari- 
zing numerical data is by preparing a frequency 
table. In the case of attributes type of data, 
the summarization is done by noting the number 
of items in the two categories, namely, good and 
bad, so that the sum of items falling in these two 
categories is equal to the total number of items 
inspected. 

6.3 In the case of discrete or continuous varia- 
bles, the summarization is achieved by forming 
classes within which the individual observation 
differs little from one another. The full set of 
data can then be replaced by a table which 
shows the number of original observations falling 
within each class. 

6*4 No rigid rules can be laid down in the 
formation of class intervals, but the following 
broad guidelines may be helpful in most of the 
practical situations: 

a) For a frequency to show up any definite 
pattern, the total number of observations 
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should not be less ( preferably more ) 
than 100. ' 

b) As far as practicable the intervals should 
be of equal width for better graphical 
representation and easier computation 
of the mean, standard deviation, etc. 

c) From the maximum and the minimum 
values found in the data, the range (R) 
may be obtained and divided into suitable 
number of smaller class intervals of equal 
width. The total number of class inter- 
vals ( k ) may vary from 7 to 15 depend- 
ing on the nature of the data and the 
number of observations. The relation- 
ship is as follows: 



1. 



Number of class intervals ( k ) = Integral 

nartofr Rang ? (/?) --- 1 + 
part0t LCIasswidth(C)J + 

Once the value of the range of the entire 
data is known and also that the total 
number of class intervals ( k ) should be 
from 7 to 15, the class width ( C ) may be 
obtained from the above relation. 

It is not advisable to increase unneces- 
sarily the number of class intervals so as 
to accommodate one or two extreme 
observations. Such observations can be 
classified conveniently in the first ( last ) 
class interval which may be written as 
less ( greater ) than or equal to a certain 
value', leading to first ( last ) as open 
class interval. The class interval should 
neither be so large that considerable error 
may be introduced in further calculations, 
nor should it be too small to result in too 
many class intervals having zero or very 
small frequencies. 

d) The class interval should be defined in 
such a way that each observation belongs 
to one and only one class. This may be 
achieved by specifying class limits to one 
more decimal place than that obtained 
in the set of observations to be classified. 
Alternatively, some arbitrary rule can be 
fixed in advance with regard to the alloca- 
tion of observations coinciding with the 
class limits. This rule shall be spelt out 
before summarization is undertaken. 

c) It is desirable to choose the class inter- 
vals in such a way that the mid-point of 
the intervals is a convenient figure for 
calculation and plotting. Where there is 
a tendency for certain values ( for exam- 
ple, multiples of 5 or 10) to occur much 
more frequently than the neighbouring 
values, the classes should be so arranged 
that these values occur at about the 
middle of the corresponding intervals. 

f) Observations, which are to be grouped, 
should not be rounded off before group- 



ing, since grouping is a form of rounding 
and successive rounding is liable to lead 
to systematic errors. 

6.5 The method of grouping observations may 
be illustrated by summarizing the data given 
in Annex A. The first step is to decide what 
intervals to use. In this example, the minimum 
and the. maximum values from the data are 
obtained as 0*7 and 5*3 respectively. 



Range ( R ) = 5 3 - 7 - 46 

Number of class intevals ( k ) « Integral part 

-ra * ■ 

lfC = 0-5, then* = 10 

Hence, if 0*5 is chosen as the width of each class 
interval, then there will be 10 class intervals. 

As the values of moisture content are given up to 
first place of decimal, the class intervals may be 
formed with two places of decimal so that each 
observation belongs to one and only one class 
interval. Also, as minimum value in the data 
is 07 and the class width has been chosen as 
5, the first class interval may be formed as 
55-105. Similarly, the subsequent class 
intervals may be formed with class width of 
0*5 till the maximum value is also included in 
the class interval ( see first col of Table I ). 

For each observation in the set of data, a tally 
mark is put against the relevant class interval. 
For case of counting, these marks are arranged 
in blocks of 5, the fifth tally mark being drawn 
across the other four. Table 1 shows the arrange- 
ment. When every observation has been allo- 
cated to a class, the tally marks in each class are 
counted to get the frequency of that class. 

6.6 Frequencies may be expressed either as 
actuals or as relative frequencies. The relative 
frequencies may be expressed in decimals or as 
percentages of the total (see Table 2). 

6.7 For some purposes it is specially useful to 
know the number of observations which are 
less than ( or greater than ) particular values. 
A tabic which gives frequencies in this form is 
called a cumulative frequency table. A cumu- 
lative frequency table is built up readily from a 
frequency table as follows: 

In 4 less than' cumulative tables, the frequency of 
the first class interval is written against the 
upper limit of the first class interval. The fre- 
quency of the second class interval is added to 
that of first class interval and their sum is record- 
ed against the upper limit of the second class 
interval. Each class frequency is added in turn 
and the cumulative sum is recorded at each step 
until the whole table is completed ( see Table 3 ). 
To obtain the cumulative table for observations 
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Table 1 Frequency Table 
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1 0*-l ss 
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1 55-2 Ob 


mj ruJ rHJ rHJ rHJ rHJ // 


32 


7 05-2 55 


rHJ rHJ rfH rHJ rHJ rHJ rHJ rHJ 


40 


2 SS-3 05 


rHJ rHJ rHJ rHJ rHJ rHJ rHJ rHJ ruj rHJ // 


52 


3 05-3 55 


rHJ rHJ rHJ tHJ rHJ rHJ rHJ rHJ t/t 


43 


3 55-4 05 


rHJ rHJ rHJ rHJ rHJ rHJ It 


32 


I C5-4 55 


rHJ rHJ rHJ / 


11 


A 55-5 05 


rHJ // 


7 


5 05-5 55 


//// 

TOTAL 


240 



greater than the specified values, the procedure 
is as described except that a start is made at the 
bottom of the table ( corresponding to the last 
class interval ) ( see Table 4 ). 

NOTE — In the case of Mess than* cumulative tables, 
the upper limits of class intervals are to be used 
whereas for 'more than* cumulative tables, the lower 
limits of class intervals are relevant. 



Table 2 Actual and Relative Frequency 
Distribution 



Table 3 'Less Than* Cumulative Frequency 
Distribution ( Actual and Percentage ) 

( Clause 6.7 ) 



( Clauses 6.6 and 7,3.1 ) 



Clan Intervals 



U) 

055-1 05 
105-1*55 
1-55-205 
2-05-2-55 
2-55-3-05 
3*05-3-55 
355-4 05 
405-4*55 
455-505 
505-5-55 



Frequency 



Actuals 

(2) 

8 

16 

32 

40 

52 

43 

32 

16 

7 

4 

Total 250 



Relative 
( Percentage ) 

(3) 

3-2 

64 
128 
160 
208 
17-2 
128 

6*4 

28 

1-6 

1000 



Moisture Content 
( Percent by Mass ) 

(1) 
105 
1-55 
205 
255 
305 
355 
405 
455 
505 
555 



Number Below 

the Given 

Limit 

(2) 

8 

24 

56 

96 
148 
191 
223 
239 
246 
250 



Percentage 

Below the 

Given Limit 

0) 

3-2 

96 
22-4 
38-4 
592 
764 
89-2 
95-6 
984 
100-0 



Table 4 

Distribution 



♦More Than' Cumulative Frequency 
ration ( Actual and Percentage ) 



( Clause 6.7 ) 



6.7.1 Figure 1 depicts the 'less than' and 'more 
than 9 cumulative curves for the relative fre- 
quencies. These two curves intersect at a point, 
the abscissa of which corresponds to the median. 



Moisture Content 
( Percent by Mass ) 

(1) 

0-55 
105 
1-55 
205 

255 
305 
355 
405 
4-55 
505 



Number Above 
the Given 



(2) 

250 

242 

226 

194 

154 

102 

59 

27 

11 

4 



Percentage 

Above the 

Given Limit 

(3) 

1000 
96-8 
904 
776 
616 
408 
236 
10*8 
44 
16 
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Fig. 2 Histogram 
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Fig. 3 Frequency Polygon 
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Fig. 4 Frequency Curve 

8. CHARACTERISTIC MEASURES OF 
FREQUENCY DISTRIBUTION 

8.1 The two imporrant characteristics of fre- 
quency distribution are: 

a) central tendency, and 

b) spread or dispersion. 

8.1*1 The most widely used measures of central 
tendency are the mean and the median. In some 
situations, mode is also used as a measure of 
central tendency. 

8.1.2 The two most useful measures of spread 
are the range and the standard deviation. 

8.2 Measures of Central Tendency 

8.2.1 Mean 

The mean is obtained by dividing the sum of the 
observations by the number of observations. 
Symbolically, if x u x 2 » » *n denote the n indi- 
vidual observations, then the mean denoted by 
x (read '.v-bar') is given by the formula: 



X ~ 



X\ + X2 + 



+ X n 



For example, if the values of percentage elonga- 
tion at rupture of 8 samples from a batch of 
copper wire were obtained as 150, 16 0, 16*4, 
16-7, 20-9, 19*6, 171 and 191, the mean elonga- 
tion is: 



150 + 



+ 191 



8 



8.2.2 Weighted Mean 

The weighted mean of the values x u x* *n 

having weights w\ 9 w l9 , w n respectively is given 

by the formula: 

- = . H 'i *i + H 'i *2 + 4- Wn *n 

H>i + R> 2 + + M'n 

For example, in a sampling investigation conduct- 
ed to determine the quality of a lot of gypsum, 
five gross samples were collected and analyzed 
for calcium sulphate percentage. The test results 
obtained were: 

89 09, 86 02. 83-38, 8869 and 9176 represen- 
ting respectively the tonnages of 89-50, 8900 
79-25, 72-25 and 7250. 

The quality of lot given by weighted mean is 
calculated as follows: 

89*50 x 8909 + 8900 x 8602 + 

- ... + 72 50 x 9176 

* 89.50 + 89 00 + 79-25 + 7225 + 72-50 

8.2.3 Median 

Median is that value which divides a series of 
observations arranged in the order of magnitude 
so that the number of observations less than the 
median is equal to the number of observations 
greater than the median. Thus, if the number 
of observations is odd, then the median is the 
middlemost value when the observations are 
arranged in ascending or descending order. How- 
ever, if the number of observations is even, then 
by convention the median is defined as the mean 
of the two middle most observations. 

Thus in a set of 5 observations 11, 12, 14, 16 and 
17, the value of the median is 14. In another set 
of 10 observations given by 8,9,9, 10, 11, 12, 
12, 13, 16 and 19, the value of the median is 
obtained as ( 11 + 12 )/2 = 11-5. 

In certain situations, the median is a better mea- 
sure of central tendency than the mean since 
the former is not very much affected by the 
extreme values found in the data. It has the 
added advantage of being very simple to calculate 
especially when the number of observations is 
odd. On the other hand, mean has the repre- 
sentative character of the whole data since all 
the values enter directly in its computation and 
has extensive applications in statistical theory. 
However, in the case of small samples usually 
encountered in the control chart techniques used 
in process control, both are equally efficient. 

8.2.4 Mode 

17-6 percent Mode is that value of the variable which has the 






/- 1 



1408 
8 



highest frequency. This measure of central ten- 
dency, though the easiest of the three ( mean, 
median and mode ) to find from a frequency 
distribution, has fewer applications in quality 
control. 

8.3 Measures of Dispersion 

8.3.1 Range 

The range is the difference between the largest 
and the smallest observation obtained in a set of 
data. Symbolically, R = X max — JT m , n , where 
R denotes the range, X m *x is the largest observa- 
tion and JKmin is the smallest observation. 

8.3.2 Sample Standard Deviation 

This is defined as the square root of the quotient 
obtained by dividing the sum of the squares of 
deviations of the observations from their mean 
by one less than the number of observations in 
the sample. 

Symbolically, if jrj, jc 2 > » *n are the n observa- 
tions with the mean value x, the standard devia- 
tion (s) is given by the formula: 



, = [<* 



-rp + + (*n-x)n' 



1/2 



«- 1 



J 



S «= 



n ") in 

] 



/-I 



H- 1 



S x? - ( ixif 

/-I /-I 



n— 1 



11/2 



For the data given in 8.2.1, the standard devia- 
tion is: 



s = 



2 507 04 - 



(1408) 
8" 



t-ji/2 

J = (414)Wi 



203 



8.3.3 Pooled Standard Deviation 



The pooled standard deviation for k samples of 

sizes m, i «■ 1, 2, , k and standard deviation 

su / ~ 1, 2 , k respectively is defined as 

follows: 



s = 



S (* -I)*, 1 

/» l _ 

it 

/- 1 



,1 



1/2 



m — k 



For example, a lot of sized iron ore was divided 
into three sub-lots for determining iron ore con- 
tent. The number of unit samples taken from 
each sub-lot were 5, 6 and 8 with corresponding 
sample standard deviation values (in percent) 
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1*02, 104 and 0*96. The pooled standard devia- 
tion value of the lot is calculated as follows: 

. _ r **(^02)M-5x(104)* + 7x(0 96)n t/> 
'~L (5 + (J + *)-3 J 

s = ( looi 3 yn 

s~ 100 

In case the k samples are drawn from it different 
populations having same standard deviation, the 
value of the pooled standard deviation is given 
by: 



S S3 



L («( — l)s* 

i - 1 



E *(*-.*<)* 



n 

/- 1 



-* 



1/2 



where x\ and s x are the meanand sample standard 
deviation of i th sample and x is the pooled mean 
of all the k samples. 

8.3.4 The range is nearly as 'efficient* as the 
standard deviation as an estimate of the lot 
standard deviation when the sample size is small, 
say less than 10. Because of its simplicity, the 
range is extensively used in control chart work. 
However, as the sample size increases beyond 10, 
the efficiency of the standard deviation becomes 
much greater than that of the range which is 
easily affected by the extreme values that may 
exist in the set of data. 

8.3.5 The coefficient of variation CV is defined 
as follows: 

100 s 

CV - -^- 



where 

5 =» sample standard deviation, and 
x — absolute mean value. 

The coefficient of variation for the data given 

in 8.2.1 is: 

2-05 
CV = -j ? ^ x 100 = 1153 percent 

9 COMPUTATION OF MEAN AND 
STANDARD DEVIATION FROM SAMPLE 
FREQUENCY DISTRIBUTION 

9.1 The computation of mean and standard 
deviation from a larger body of data ( say, con- 
taining more than 50 observations ) can be carri- 
ed out with considerable economy of labour and 
with a degree of accuracy adequate for all practi- 
cal purposes by using a frequency distribution. 

9.2 If the values xi, x 2 . , x n are occurring 

with the frequencies fufz* , / n respectively, 

the mean value is: 

it 

_ = f lXt +f 2X2 + +/ nXn _ i?i />Xt 

'l + h T •••• + /n * ^ 
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The standard deviation for this type of data is 
obtained as: 

*" L (/. + + /.)-* J 



/-t /-i 





Jl 


/-I 


/•-l 



id 



J 

NOTE — When the total number of observations is 
large ( more than 50 ), the value of the sample stan- 
dard deviation will not differ much even if the sum 
of square of deviations from mean is divided by the 
total number of observations instead of one less than 
the total number of observations. 

9 J In the case of a frequency distribution for 
a continuous variable with n class intervals of 
equal width, the calculation of mean and standard 
deviation may be done by the same formulae as 

those given in 9.2 by replacing *j, xi , x„ 

by the mid-value of the n class intervals. 

M The computation of mean and standard devia- 
tion can be simplified considerably by suitably 
transforming the original variable and then work- 
ing with the new transformed variable. It is 
advisable to subtract from each of the mid-values 
of the class intervals that mid-value ( say x ) 
which has the maximum frequency and then 
divide each such difference by the width of the 
class interval (c) so that the transformed values 
df t = ( xi ~ x )/c take negative, zero and posi- 
tive digital values in a continuous sequence. The 
mean and standard deviation of the transformed 



values are then calculated by using the formulae 
given in 92. The mew and standard deviation 
for the original data are calculated as: 



Mean ( x ) = .r + c 



/-I 
ii 

,-V' 



Standard deviation ( s ) 



I M 2 -(Z MP 
i-l /-I 



c x 



n 



if 

s/i- 1 
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J 



9.4*1 The steps in the computation of mean and 
standard deviation by transforming of variables 
are illustrated in Table 5. Columns 1 and 3 
reproduce the frequency distribution of Table 1. 
Column 2 gives the mid-value of the class inter- 
vals and for all further calculations it would be 
assumed that these mid-values occur with the 
frequencies indicated in col 3. Column 4 gives 
the transformed variable obtained by using a 
convenient working origin and working unit. 
In the present case, the working origin is 2*8 
which is the mid-value of the class interval 
2-55-3'OS having the maximum frequency. The 
working unit is the same as the class width which 
is 0*5. Columns 5 and 6 give the values of fidt 
and/i</, 2 . A typical proforma used for making 
frequency table and for computation of mean 
and standard deviation from frequency data is 
given m Annex B. 



Table 5 Computation of Mean and Standard Deviation from Frequency Data 

(Clause 9.4.1) 



Class 


Mid- 


Frequ- 


Tnuttfonatd 


M 


M* 


Iatenral 


Value 


ency 


Value 








XI 


/« 


^ _ xi - 28 
* 0-5 






(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


055-105 


08 


8 


-4 


-32 


128 


1-05-1-55 


1-3 


16 


-3 


-48 


144 


1-55-205 


18 


32 


-2 


-64 


128 


205-255 


2-3 


40 


- 1 


-40 


40 




- 184 




2-55-305 


28 


52 











305-3-55 


33 


43 


1 


43 


43 


3-55-405 


38 


32 


2 


64 


128 


405-455 


43 


16 


3 


48 


144 


4-55-505 


48 


7 


4 


28 


112 


505-5-55 


5-3 


4 


5 


20 


100 



203 



Total 



250 



203 - 184 - 19 



967 
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Serial 


Mofatarc 


Serial 


Mobtare 


Serial 


MODcare 


Serial 


Moisten 


Neater 


(Perceat 


Neater 


(Perceat 


Neater 


(Pcrcaat 


Neater 


(Perceat 




by Man) 




by Ma*) 




byMaa) 




by Mm) 


0) 


(2) 


0) 


I (2) 


(1) 


(2) 


(1) 


(2) 


141 


2-4 


168 


3-4 


195 


25 


223 


2-8 


142 


2-5 


169 


15 


196 


23 


224 


37 


143 


27 


170 


4-3 


197 


1-2 


225 


32 


144 


0-7 


171 


2-8 


198 


21 


226 


li 


145 


20 


172 


2-9 


199 


30 


227 


3-1 


146 


48 


173 


20 


200 


25 


228 


2-2 


147 


3-7 


174 


27 


201 


1-4 


229 


23 


148 


16 


175 


3-7 


202 


30 


230 


3-6 


149 


26 


176 


30 


203 


3-3 


231 


3-8 


150 


28 


177 


3-6 


204 


20 


232 


36 


151 


22 


178 


07 


205 


18 


233 


32 


152 


2*2 


179 


14 


206 


32 


234 


20 


153 


28 


180 


1*8 


207 


31 


235 


2-8 


154 


29 


181 


20 


208 


36 


236 


3-5 


155 


27 


182 


1-0 


209 


31 


237 


23 


156 


2'0 


183 


09 


210 


3-6 


238 


38 


157 


1-8 


184 


1-5 


211 


3'4 


239 


18 


158 


1-8 


185 


10 


212 


3-2 


240 


26 


159 


0*8 


186 


1-2 


213 


2*6 


241 


20 


160 


5'3 


187 


23 


214 


2-8 


242 


2-4 


161 


5-3 


188 


31 


215 


26 


243 


27 


162 


3*7 


189 


38 


216 


2*7 


244 


31 


163 


41 


190 


4-3 


217 


1*8 


245 


2-8 


164 


3'9 


191 


20 


218 


2*7 


246 


1-8 


165 


4'4 


192 


31 


219 


2*6 


247 


2-2 


166 


2-5 


193 


16 


220 


28 


248 


4-5 


167 


1-7 


194 


2*5 


221 


37 


249 


31 










222 


29 


250 


3-4 
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ANNEX B 

(Clause 9.4.1) 
PROFORMA FOR SUMMARIZATION OF DATA 



Product: 
Characteristic: 






Unit of Measurement : 


Class 
Interval 

(1) 


Mid-Value 

Xl 

(2) 


Transformed 
Valoe 

d% XI - Xo 

e 
(3) 


Frequency 
( Tally Marks ) 

(4) 


Freaaeacy 

/i 
(5) 


or 

/n/i 


or 
(7) 










































- 






































— 

































E/iXt 


! Total 

n 

c x 2 M 
/_J 

n 

1-1 


— 


- 


JVlCall — " 
n 

1-1 


1 





Standard deviation — ex 



S /i* 



/=! 



n I 



I 



I 






/=i _?>h 

n 

S /i-l 

1-1 
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